Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

**BREAKING** pygmt.grdcut: Refactor to store output in virtualfiles for grids #3115

Open
wants to merge 98 commits into
base: main
Choose a base branch
from

Conversation

seisman
Copy link
Member

@seisman seisman commented Mar 17, 2024

This PR applies virtualfiles in pygmt.grdcut, so that this function doesn't have to write to temporary files (#2730).

grdcut is on of the complicated module that it works for grid, image and cube. We have to know the kind of the input before calling the Session.virtualfile_out method.

It turns out it's not trivial to determine the data kind of an input file. We have some options:

  1. Based on file extensions (e.g., .nc is a grid, .tif is a image)
  2. Based on magic bytes (xref: https://en.wikipedia.org/wiki/List_of_file_signatures)
  3. Try reading as a grid/image/cube. If fail, then try the 2nd one.
  4. Add a new kind parameter (default to "grid") and users should specify the kind parameter.

Option 1-2 may not always work; option 3 also dosn't always work (see #3115 (comment)).

This PR currently adopts option 4. Need to note option 4 meaning BREAKING:

  • previously, grdcut("@earth_day_01d") works; now must use grdcut("@earth_day_01d", kind="image")
  • previously, grdcut works 1/3/4-band images; now it only supports 3-band images. [We can support 1/4-band images later]

seisman and others added 2 commits March 20, 2024 11:00
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
pygmt/clib/session.py Outdated Show resolved Hide resolved
pygmt/src/grdcut.py Outdated Show resolved Hide resolved
@seisman seisman changed the title WIP: Initial try to support raster images using temporary files WIP: pygmt.grdcut: Support both grids and images Mar 27, 2024
@seisman seisman force-pushed the datatypes/gmtgrid branch 3 times, most recently from d8f1160 to d0517e0 Compare April 1, 2024 07:11
@seisman seisman force-pushed the datatypes/gmtgrid branch from d0517e0 to 4ba67df Compare April 1, 2024 11:32
@seisman seisman changed the base branch from datatypes/gmtgrid to main April 1, 2024 11:32
@seisman seisman added the enhancement Improving an existing feature label Apr 19, 2024
weiji14 added a commit that referenced this pull request Jun 18, 2024
Special case `earth_day` to be loaded as GMT_IMAGE rather than GMT_GRID. Need to use `GMT_OUT|GMT_IS_REFERENCE` in virtualfile_out to avoid segfault, xref #3115.
weiji14 added 7 commits June 18, 2024 21:16
Extra metadata from the _GMT_GRID_HEADER struct.
Reorder the dimensions to follow Channel, Height, Width (CHW) convention. Also added doctest checking output DataArray object and the image's x and y coordinates.
Get the registration and gtype info from the grid header and apply it to the GMT accessor attributes.
Trying to match some of the doctests in _GMT_GRID.
Comment on lines 2332 to 2334
# The logic here is because: an image can be read into a grid container, but a grid
# can't be read into an image container. So, try to read the file as an image first.
# If fails, try to read it as a grid.
Copy link
Member Author

@seisman seisman Sep 29, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It turns out the logic here is not 100% correct.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • A grid can be read into an image container, as long as another libgdal package (maybe libgdal-netcdf or libgdal-hdf5) is installed. xref:

The first try-except check is to try reading into a GMT_IMAGE struct, and return "image" if there are 3 bands, and "grid" if there is only 1 band. Even if grids can be read into GMT_IMAGE, we will still call it a grid if it has 1-band right?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Trying to refresh my memory about what's happening:

  • Reading a grid into GMT_GRID: Works!
  • Reading an image into GMT_GRID: Only the first band is read.
  • Reading a grid into GMT_IMAGE: It depends on libgdal-hdf5 being installed. Read as an single-band image if installed, otherwise raise errors.
  • Reading an image into GMT_IMAGE: works.
import pygmt
pygmt.grdcut("@earth_relief_01d_p", region=[0, 10, 0, 10])

If libgdal-hdf5 is not installed, _raster_kind first read it as an image, which fails and reports following errors:

ERROR 4: `/home/runner/.gmt/server/earth/earth_relief/earth_relief_01d_p.grd' not recognized as being in a supported file format. It could have been recognized by driver HDF5, but plugin gdal_HDF5.so is not available in your installation. You may install it with 'conda install -c conda-forge libgdal-hdf5'
Error: ession [ERROR]: Unable to open earth_relief_01d_p.grd.
Error: ession [ERROR]: ERROR reading image with gdalread.
pygmt-session (gmtapi_import_image): Could not read from file [earth_relief_01d_p.grd]
[Session pygmt-session (43)]: Error returned from GMT API: GMT_IMAGE_READ_ERROR (22)
[Session pygmt-session (43)]: Error returned from GMT API: GMT_IMAGE_READ_ERROR (22)

then it tries to read it as a grid, which succeeds. But we still see the error messages above. I believe we need to find a way to suppress these errors or find an alternative way to determine the data kind.

@seisman seisman removed run/benchmark Trigger the benchmark workflow in PRs run/test-gmt-dev Trigger the GMT Dev Tests workflow in PR labels Sep 30, 2024
@seisman seisman self-assigned this Oct 30, 2024
@seisman seisman added this to the 0.14.0 milestone Dec 9, 2024
@seisman seisman added the needs review This PR has higher priority and needs review. label Dec 9, 2024
@seisman seisman marked this pull request as ready for review December 9, 2024 11:02
outkind = kind
case "_":
msg = f"Unsupported data type {type(grid)}."
raise GMTInvalidInput(msg)
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Lines 117-118 should be caught by the test_grdcut_fails test.

@seisman seisman changed the title pygmt.grdcut: Refactor to store output in virtualfiles for grids **BREAKING** pygmt.grdcut: Refactor to store output in virtualfiles for grids Dec 10, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Improving an existing feature needs review This PR has higher priority and needs review.
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants